Complex network study of Brazilian soccer players 
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Although being a very popular sport in many countries, soccer has not received much attention 
from the scientific community. In this paper, we study soccer from a complex network point of view. 
First, we consider a bipartite network with two kinds of vertices or nodes: the soccer players and 
the clubs. Real data were gathered from the 32 editions of the Brazilian soccer championship, in a 
total of 13,411 soccer players and 127 clubs. We find a lot of interesting and perhaps unsuspected 
results. The probability that a Brazilian soccer player has worked at N clubs or played M games 
shows an exponential decay while the probability that he has scored G goals is power law. Now, if 
two soccer players who have worked at the same club at the same time are connected by an edge, 
then a new type of network arises (composed exclusively by soccer players nodes). Our analysis 
shows that for this network the degree distribution decays exponentially. We determine the exact 
values of the clustering coefficient, the assortativity coefficient and the average shortest path length 
and compare them with those of the Erdos-Renyi and configuration model. The time evolution of 
these quantities are calculated and the corresponding results discussed. 
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In the past few years there has been a growing inter- 
est in the study of complex networks. The boom has 
two reasons - the existence of interesting applications in 
several biological, sociological, technological and commu- 
nications systems and the availability of a large amount 
of real data 0,111111. 

Social networks are composed by people interacting 
with some pattern of contacts like friendship, business 
or sexual partners. One of the most popular works in 
this area was carried out by Milgram who first ar- 
rived to the concept of the "six degrees of separation" 
and small-world. Biological networks are those built by 
nature in its indefatigable fight to turn life possible: the 
enetic regulatory network for the expression of a gene 
blood vessels Q, food webs and metabolic path- 
ways 0. Technological or communications networks are 
those constructed by man in its indefatigable fight to turn 
life good: electric power grid |4l llOj . airline routes nOj , 
railways , internet ^1 and the World Wide Web . 

A network is a set of vertices or nodes provided with 
some rule to connect them by edges. The degree of a 
vertex is defined as being equal to the number of edges 
connected to that vertex. In order to characterize a net- 
work, six important quantities or properties can be calcu- 
lated 1, ;2]: the degree distribution, the clustering coef- 
ficient, the assortativity coefficient, the average shortest 
path length, the betweenness and the robustness to a 
failure or attack. The first four quantities appear in this 
work and their meanings are explained below. 

In this report, we study a very peculiar network: the 
Brazilian soccer network. Using the information at our 
disposal fl4| , a bipartite network is constructed with two 
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types of vertices: one composed by 127 clubs (teams) and 
the other formed by 13,411 soccer players. They corre- 
spond to the total number of clubs and soccer players 
that have sometime participated of the Brazilian soccer 
championship during the period 1971 — 2002 |l5j. When- 
ever a soccer player has been employed by a certain club, 
we connect them by an edge. 




FIG. 1: Histogram of the number of clubs against the number 
of goals scored by match. Bins of size 0.1 were used. The full 
line corresponds to the fitted Gaussian curve. The average 
number of goals by match is equal to 1.00. 

Figure 1 shows the number of clubs Nc versus G/M, 
which stands for goals by match and it is equal to the 
total number of goals (G) scored by a club, divided by the 
number of matches (M) disputed by that club. Clearly, 
the data are well fitted by a Gaussian curve centered at 
G/M ~ 1.03. The Brazilian club with the best index is 
Sao Caetano (G/M = 1.73) and the worst is Colatina 
with G/M = 0.22. 

Figure 2 plots the degree distributions for each kind of 
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vertex of the bipartite network: players and clubs. The 
player probability P(N) exhibits an exponential decay 
with the player degree N. Naturally, N corresponds to 
the number of clubs in which a player has ever worked. 
We find the average N = 1.37. The most nomad player is 
Dada Maravilha with TV = 11. The inset shows the 
club probability P(S) as a function of the club degree S. 
Regrettably, its form cannot be inferred may be because 
the small number of involved clubs. 
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FIG. 2: Probability P(N) that a player has worked for N 
clubs. The full line corresponds to the fitted curve P(N) ~ 



10" 



So, it is 190 times more probable to find someone 



who has played for only two clubs than for eight clubs, 
inset is the degree distribution P(S) for the clubs. 
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FIG. 3: (a) Game probability P(M) versus the number M of 
disputed matches, (b) Cumulative distribution P C (M) built 
from P{M). Fittings appear as the full lines. 

A very amazing result comes out when we deter- 
mine the probability P(M) that a soccer player has 



played a total of M games (disregarding by what club). 
There is an elbow (see Figure 3a) or a critical value 
at M c — 40 for the semi-log plot of P(M). As there 
is a lot of scatter, we also determine the correspond- 
ing cumulative distribution P C {M) p|. The latter dis- 
tribution is very well fitted by two different exponen- 
tials: P C (M) = 0.150 + 0.857 10-° O42M for M < 40 and 
P C (M) = 0.410 10-° 010M for M > 40, as it is shown in 
the Figure 3b. This implies that the original distribution 
P(M) has also two exponential regimes with the same 
exponents pj. The existence of the threshold M c prob- 
ably indicates that, after a player has found some fame 
or notoriety, it is easier to him to keep playing soccer. 
Surpassing this value is like the player has gained some 
kind of "stability" in his job. 




FIG. 4: (a) Goals probability P(G) that a player has scored 
G goals. Choosing randomly, a Brazilian soccer player has 
ten times less chance to have scored 36 goals than 13 goals. 
The player with the highest score is Roberto Dinamite with 
G — 186. (b) The corresponding cumulative probability dis- 
tribution P C (G). 

As the goals are the quintessence of the soccer, we 
determine the goals probability P(G) that a player has 
scored G goals in the Brazilian championships. The re- 
sult is shown in the Figure 4a. Here again we find an 
intriguing threshold at G c = 10 separating regions with 
apparently two distinct power law exponents. Such kind 
of behavior has already been found in the context of 
scientific collaborations network 47]. To verify if this 
threshold really exist (since the tail is, once more, very 
scattered) we calculate the corresponding cumulative dis- 
tribution P C (G), plotted in the Figure 4b. The curve we 
get resembles that one found in the network of collab- 
orations in mathematics (see fig. 3.2(a) of ref. 1) and 
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it may correspond to a truncated power-law or possibly 
two separate power-law regimes. We have tried to fit 
P C (G) with truncated power laws. However, our best re- 
sult was obtained using two power laws with different 
exponents: P C (G) = -0.259 + 1.256 G~ - 500 for G < 10 
and P C {G) = -0.004 + 4.454 G~ 1A4 ° if G > 10. No- 
tice that the existence of additive constants in the power 
laws spoils the expected straight line characteristic in 
the log-log plot. Moreover, as the cumulative distribu- 
tion does not preserve power law exponents, it follows 
that P(G) - G- 1 - 5 and P(G) ~ G^ 2 44 for G < 10 and 
G > 10, respectively. We conjecture that the origin of 
this threshold can be simply explained by the structure, 
position or distribution of the players in the soccer field. 
Circa of two thirds of the eleven players form in the de- 
fense or in the middle field. Players in these positions 
usually score less than those of the attack. 

From the bipartite network (of players and teams), 
one can construct an unipartite network composed ex- 
clusively by the soccer players. If two players were at 
the same team at the same time, then they will be con- 
nected by an edge. Let us call the resulting network as 
the Brazilian soccer player (BSP) network. With this 
merging, we get a time growing network that reflects ac- 
quaintances and possible social relationships between the 
players. Similar merging have already been done for the 
bipartite networks: actor-film 0, ^| , director-firm 0] 
and scientist-paper [l7j . 

In the year 2002, the BSP network had 13, 411 vertices 
(the soccer players) and 315, 566 edges. The degree prob- 
ability distribution P(k) can be easily calculated and we 
obtain an average degree k = 47.1. The result is plotted 
in the Figure 5. 
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FIG. 5: Degree distribution of the BSP network. The fitting 
curve (full line) has the exponential form P(k) ~ io~ oollfe . 

Many others quantities of the BSP network can be pre- 
cisely evaluated. One can measure, for example, what is 
the probability Gj that the first neighbors of a vertex i 
are also connected. The average of this quantity over the 
whole network gives the clustering coefficient G, a rele- 
vant parameter in social networks. We find G = 0.79, 



which means that the BSP is a highly clustered net- 
work. At this point, it is very interesting to compare 
the BSP network results with those of random graphs 
as the Erdos - Renyi (ER) model [19( and the configura- 
tion model [l|,|20j. We simulated a ER network (with the 
same size as the BSP) in which the vertices are connected 
with a probability equal to 0.00351, which gives, approx- 
imately, the same number of edges for both networks. 
Still keeping the network size, we also simulated the con- 
figuration model using the fitting curve of the Figure 5 
as the given degree distribution. We see from Table 1, 
that both ER and the configuration model have a small 
clustering coefficient as it would be expected for random 
networks. 
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TABLE I: In the first column, v is the number of vertices, e 
is the number of edges, k is the mean connectivity, C is the 
clustering coefficient, A is the assortativity coefficient and D 
is the average shortest path length. 



The assortativity coefficient A 1] measures the ten- 
dency of a network to connect vertices with the same 
or different degrees. If A > (A < 0) the network is 
said assortative (disassortative) and no assortative when 
A = 0. To determine A, we first need to calculate the 
joint probability distribution ejk, which is the probabil- 
ity that a randomly chosen edge has vertices with degree 
j and k at either end. Thus, 

a " jk 

where q k = £V e jk and cr 2 = J^k ^ 2< ?fc ~ k 1k) 2 ■ The 
possible values of A lie in the interval — 1 < A < 1. For 
the BSP network, we find A = 0.12 so it is an assortative 
network. This value coincides with that of the Mathe- 
matics coauthorship (2l| and it is smaller than that of 
the configuration model (A = 0.46). The explanation 
is very simple: although the vertices of the configura- 
tion model are in fact randomly connected, the given 
degree distribution constraint generates very strong cor- 
relations. This can be measured by the nearest-neighbors 
average connectivity of a vertex with degree k, K nn (k) 
|22|. which is plotted in Figure 6. 

Finally, we also determine the average shortest path 
length D between a given vertex and all the others ver- 
tices of the network. Taking the average of this quantity 
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TABLE II: Temporal evolution of some quantities of the BSP 
network. The meanings of the first column are the same as 
those of Table I. 



FIG. 6: Nearest-neighbors average connectivity for the ER, 
BSP and Configuration model. 

for all BSP network vertices, we get D = 3.29. In anal- 
ogy with social networks, we can say that there is 3.29 
degrees of separation between the Brazilian soccer players 
or, in other words, the BSP network is a small- world. 

We can also study the time evolution of the BSP net- 
work. We have verified that this network is broken in 
many clusters in 1971 and 1972, after that there is only 
one component. In Table II, we observe an increasing 
mean connectivity k. We can think of two reasonings 
for that: the player's professional life is turning longer 
and/or the players transfer rate between teams is grow- 
ing up. On the other hand, the clustering coefficient 
is a time decreasing function. Also in this case, there 
may be two possible explanations: the players transfer 
rate between national teams and the exodus of the best 
Brazilian players to foreigner teams (which has increased, 
particularly, in the last decades). 



Naturally, this kind of movement diminishes the 
cliques probabilities. From the Table II, we also see that 
the BSP network is becoming more assortative with time. 
This seems to indicate the existence of a growing segre- 
gationist pattern, where the players transfer occurs, pref- 
erentially, between teams of the same size. Finally, the 
average shortest path length values may suggest that it 
is size independent but, most probably, this conclusion is 
misled by the presence of only some few generations of 
players in the growing BSP network. 

We hope that the work presented here may stimulate 
further researches in this subject. Some opened questions 
are, for instance, whether the results obtained for the 
Brazilian soccer held for different countries or, perhaps, 
for different sports. 
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